
\chapter{Introduction}
        The amount of information available nowadays is in continuous and enormous increase, forming what is referred to as an information explosion.
        Sources of information are evolving and expanding rapidly, the World Wide Web is becoming a continuously growing pool of multimedia information varying
        from text documents, images and videos. 
        \\

        This growth is due to several factors, such as the non stop evolution in the different fields of technology, which
        penetrated nearly all the aspects of life, such as social communication, commerce, industry and entertainment. New devices like smart phones are emerging and their users are continuously increasing, providing
        a variety of new ways and techniques for producing, sharing and retrieving information.
        \\

        However, the essential benefits of this large amount of information do not lie in just storing and keeping it, rather the true essence lies in retrieving subsets of such
        information to serve particular needs and purposes, or to use available information to generate new information from it. Multimedia information retrieval hence becomes 
        increasingly important and essential. 
        \\

        Retrieving information faces several challenges, particularly when dealing with huge amount of information. These challenges 
        include:   
                \begin{inparaenum}[(i)]
                        \item ensuring a fast response for the retrieval system.
                        \item achieving high quality of the retrieval results, a typical retrieval system should be able to retrieve
                        the information relevant to a user's request (query) and with a limited error rate.
                        \item providing efficient storage usage and accessing methods which are scalable enough to handle large scale
                        datasets.
                \end{inparaenum}
        \\
        
        Among the various sub-fields of information retrieval, the focus of the work done in this thesis is particularly on image retrieval.
        The image retrieval problem is focused on retrieving a set of images from a larger database of images. Given a certain query image from the user, 
        the system is required to retrieve the set of \emph{relevant} images to the query image. Generally, two images can be considered as \emph{relevant} if they contain the same objects
        or if they capture similar scenes.
        
        Image retrieval techniques can also extend to include videos, by analyzing the frames extracted from them.
        \\
        
        Image Retrieval approaches can be classified into:
        \begin{inparaenum}[1)] 
                        \item Concept based approaches where the basis for retrieval is the meta-data information associated with images such as textual descriptions, tags and keywords and the
                        textual context in which the images occur.
                        \item Content based approaches where low level features of images are used as the basis for retrieval. Such features include texture, color, and elementary objects inside in the image.
        \end{inparaenum}
        Content based approaches are the main focus of the work done in this thesis
        \\

        One of the challenges of image retrieval is the presence of numerous factors which increases the complexity of matching images, such as the 
        differences in perspectives, lighting effects, scale and color distribution. Hence, the process of matching images should be accurate enough
        to detect similarities between objects and scene elements, yet it should be robust and flexible enough to accommodate such possible differences in conditions.
        
        
        \section{Motivation}
        Images are captured or created and used on a daily basis for research, entertainment and as a tool for education and knowledge representation purposes. The number of images available both privately or publicly on the Internet
        is increasing enormously every day due to the continuously increasing number of users of digital cameras and smart phones embedding digital cameras. Statistics show
        that Flickr which is an image hosting and sharing website, has reached a total of about 4 billion images, with a very high growth rate of about 5000 new images being uploaded per minute.
        \\

        Due to the decreasing hardware costs and the rapid technological progress in the storage media, the problem of storing such large scale sets of images is
        of diminishing importance. However, organizing and retrieving useful information from huge images databases are becoming more crucial challenges.
        \\
        
        Consequently, investigating methods and providing retrieval tools which can be used to solve the image retrieval problem is of an increasing importance and necessity.  
        Such tools are required to provide solutions for the scalability problem, and to ensure high performance in terms of quality and speed of 
        the retrieval process.
        \\
        
        The application fields of image retrieval are numerous, for example image retrieval is becoming an essential corner stone in the field of forensics and crime investigation, 
       for matching different sorts of evidence. Searching and analyzing medical imaging records (e.g. X-Ray) is also one of the main application domains of image retrieval. 
        \\
        
        Moreover, image retrieval is used in duplicates detection for intellectual property laws enforcement. Also it is used in the field of pornography detection and child protection.
        And last but not least, recent applications of smart phones are focused on mobile image searching, where images captured through the camera embedded in the mobile phone are used to query and to retrieve information 
        about the captured objects from the Internet.
        \\
        
        These application fields induce the need for developing and evaluating accurate methods for image retrieval, and to provide scalable approaches which can
        handle searching in large scale databases of images.


        \section{Aim}
        The main aim of the work done in this thesis are to develop and evaluate a scalable approach for image retrieval based on migrating existing text documents retrieval approaches and adapting
        them to image retrieval. 
        \\

        An existing image retrieval approach is based on direct patch searching, where a patch refers to an individual feature extracted from an image. The main drawback of this approach is the high query time, 
        since an exhaustive search is performed between each pair of images. Since an image can contain more than 1000 features, this makes the retrieval
        process very slow. Moreover, the query time increases linearly with the number of images in the database, which imposes a strong scalability limitation. 
        \\

        The work done aims at overcoming the drawbacks of the latter approach by using alternative methods tackling the limited scalability problem. These alternative methods include
        data space partitioning which aims at capturing similarities between image features at the off-line stage (before actual query time), hence reducing the overhead imposed
        on the on-line stage (at query time) and avoiding performing exhaustive matching at query time. 
        \\
        
        Additionally, efficient methods for storing and accessing image features are included in order to increase the efficiency of the retrieval process. These methods help to efficiently
        map between matching features and corresponding images.
        \\

        Through the work done in this thesis, the scalability of the approach adopted is assessed in comparison to the exhaustive search based method. The querying time of both approaches are to be compared
        in accordance with the number of images used, in order to evaluate the ability of the system to operate efficiently on larger sets of images (around $10^5$ images).
        
        \section{Outline}
        \label{sec:outline}
        The following chapter of this thesis includes the background information relevant to the field of image retrieval and its related topics, 
        such as image features, approaches for the discretization of image features such as clustering and locality sensitive hashing. In addition, it includes an introduction
        to the topics related to text document retrieval. Finally the chapter includes a summary of some related work and previous research done 
        in the field of image retrieval.
        \\

        Chapter 3 describes the approach of this thesis. A detailed description of the processing
        pipeline of the developed retrieval system is presented.
        \\

        Chapter 4 includes the experimentation part, including the goal of each experiment, the results
        and conclusions about each result and their implications. Comparisons between different methods and setups are also provided.
        \\

        Chapter 5 includes a conclusion and a summary of the work presented in this thesis, the achieved results are compared to the original aims of the work.
        In addition, some suggestions and recommendations for future work are presented.

